Confidence Estimation for Information Extraction

نویسندگان

  • Aron Culotta
  • Andrew McCallum
چکیده

Information extraction techniques automatically create structured databases from unstructured data sources, such as the Web or newswire documents. Despite the successes of these systems, accuracy will always be imperfect. For many reasons, it is highly desirable to accurately estimate the confidence the system has in the correctness of each extracted field. The information extraction system we evaluate is based on a linear-chain conditional random field (CRF), a probabilistic model which has performed well on information extraction tasks because of its ability to capture arbitrary, overlapping features of the input in a Markov model. We implement several techniques to estimate the confidence of both extracted fields and entire multi-field records, obtaining an average precision of 98% for retrieving correct fields and 87% for multi-field records.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests

A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function.  As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...

متن کامل

Towards Confidence Estimation for Typed Protein-Protein Relation Extraction

Systems which build on top of information extraction are typically challenged to extract knowledge that, while correct, is not yet well-known. We hypothesize that a good confidence measure for relational information has the property that such interesting information is found between information extracted with very high confidence and very low confidence. We discuss confidence estimation for the...

متن کامل

Improved word confidence estimation using long range features

This paper describes experiments in improving word confidence estimation using documentand task-level features of the hypothesized word sequence from a recognizer. The improved confidence estimates are shown to improve information extraction performance, specifically named entity (NE) recognition. The detected names can then be used to further improve confidence estimation in a multi-pass NE re...

متن کامل

Interval Estimation for the Exponential Distribution under Progressive Type-II Censored Step-Stress Accelerated Life-Testing Model Based on Fisher Information

This paper, determines the confidence interval using the Fisher information under progressive type-II censoring for the k-step exponential step-stress accelerated life testing. We study the performance of these confidence intervals. Finally an example is given to illustrate the proposed procedures.

متن کامل

Confidence Estimation for Knowledge Base Population

Information extraction systems automatically extract structured information from machine-readable documents, such as newswire, web, and multimedia. Despite significant improvement, the performance is far from perfect. Hence, it is useful to accurately estimate confidence in the correctness of the extracted information. Using the Knowledge Base Population Slot Filling task as a case study, we pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004